18 research outputs found
Comparing the hierarchy of keywords in on-line news portals
The tagging of on-line content with informative keywords is a widespread
phenomenon from scientific article repositories through blogs to on-line news
portals. In most of the cases, the tags on a given item are free words chosen
by the authors independently. Therefore, relations among keywords in a
collection of news items is unknown. However, in most cases the topics and
concepts described by these keywords are forming a latent hierarchy, with the
more general topics and categories at the top, and more specialised ones at the
bottom. Here we apply a recent, cooccurrence-based tag hierarchy extraction
method to sets of keywords obtained from four different on-line news portals.
The resulting hierarchies show substantial differences not just in the topics
rendered as important (being at the top of the hierarchy) or of less interest
(categorised low in the hierarchy), but also in the underlying network
structure. This reveals discrepancies between the plausible keyword association
frameworks in the studied news portals
Effects of time window size and placement on the structure of aggregated networks
Complex networks are often constructed by aggregating empirical data over
time, such that a link represents the existence of interactions between the
endpoint nodes and the link weight represents the intensity of such
interactions within the aggregation time window. The resulting networks are
then often considered static. More often than not, the aggregation time window
is dictated by the availability of data, and the effects of its length on the
resulting networks are rarely considered. Here, we address this question by
studying the structural features of networks emerging from aggregating
empirical data over different time intervals, focussing on networks derived
from time-stamped, anonymized mobile telephone call records. Our results show
that short aggregation intervals yield networks where strong links associated
with dense clusters dominate; the seeds of such clusters or communities become
already visible for intervals of around one week. The degree and weight
distributions are seen to become stationary around a few days and a few weeks,
respectively. An aggregation interval of around 30 days results in the stablest
similar networks when consecutive windows are compared. For longer intervals,
the effects of weak or random links become increasingly stronger, and the
average degree of the network keeps growing even for intervals up to 180 days.
The placement of the time window is also seen to affect the outcome: for short
windows, different behavioural patterns play a role during weekends and
weekdays, and for longer windows it is seen that networks aggregated during
holiday periods are significantly different.Comment: 19 pages, 11 figure
Identifying Overlapping and Hierarchical Thematic Structures in Networks of Scholarly Papers: A Comparison of Three Approaches
We implemented three recently proposed approaches to the identification of
overlapping and hierarchical substructures in graphs and applied the
corresponding algorithms to a network of 492 information-science papers coupled
via their cited sources. The thematic substructures obtained and overlaps
produced by the three hierarchical cluster algorithms were compared to a
content-based categorisation, which we based on the interpretation of titles
and keywords. We defined sets of papers dealing with three topics located on
different levels of aggregation: h-index, webometrics, and bibliometrics. We
identified these topics with branches in the dendrograms produced by the three
cluster algorithms and compared the overlapping topics they detected with one
another and with the three pre-defined paper sets. We discuss the advantages
and drawbacks of applying the three approaches to paper networks in research
fields.Comment: 18 pages, 9 figure
Community landscapes: an integrative approach to determine overlapping network module hierarchy, identify key nodes and predict network dynamics
Background: Network communities help the functional organization and
evolution of complex networks. However, the development of a method, which is
both fast and accurate, provides modular overlaps and partitions of a
heterogeneous network, has proven to be rather difficult. Methodology/Principal
Findings: Here we introduce the novel concept of ModuLand, an integrative
method family determining overlapping network modules as hills of an influence
function-based, centrality-type community landscape, and including several
widely used modularization methods as special cases. As various adaptations of
the method family, we developed several algorithms, which provide an efficient
analysis of weighted and directed networks, and (1) determine pervasively
overlapping modules with high resolution; (2) uncover a detailed hierarchical
network structure allowing an efficient, zoom-in analysis of large networks;
(3) allow the determination of key network nodes and (4) help to predict
network dynamics. Conclusions/Significance: The concept opens a wide range of
possibilities to develop new approaches and applications including network
routing, classification, comparison and prediction.Comment: 25 pages with 6 figures and a Glossary + Supporting Information
containing pseudo-codes of all algorithms used, 14 Figures, 5 Tables (with 18
module definitions, 129 different modularization methods, 13 module
comparision methods) and 396 references. All algorithms can be downloaded
from this web-site: http://www.linkgroup.hu/modules.ph
An Evaluation of Community Detection Algorithms on Large-Scale Email Traffic
Community detection algorithms are widely used to study the structural properties of real-world networks. In this paper, we experimentally evaluate the qualitative performance of several community detection algorithms using large-scale email networks. The email networks were generated from real email traffic and contain both legitimate email (ham) and unsolicited email (spam). We compare the quality of the algorithms with respect to a number of structural quality functions and a logical quality measure which assesses the ability of the algorithms to separate ham and spam emails by clustering them into distinct communities. Our study reveals that the algorithms that perform well with respect to structural quality, don’t achieve high logical quality. We also show that the algorithms with similar structural quality also have similar logical quality regardless of their approach to clustering. Finally, we reveal that the algorithm that performs link community detection is more suitable for clustering email networks than the node-based approaches, and it creates more distinct communities of ham and spam edges